Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Optical character recognition of handwritten Arabic using hidden Markov models

Identifieur interne : 000551 ( Main/Exploration ); précédent : 000550; suivant : 000552

Optical character recognition of handwritten Arabic using hidden Markov models

Auteurs : Mohannad M. Aulama [Jordanie] ; Asem M. Natsheh [Jordanie] ; Gheith A. Abandah [Jordanie] ; Mohammed M. Olama [États-Unis]

Source :

RBID : Pascal:11-0436660

Descripteurs français

English descriptors

Abstract

The problem of optical character recognition (OCR) of handwritten Arabic has not received a satisfactory solution yet. In this paper, an Arabic OCR algorithm is developed based on Hidden Markov Models (HMMs) combined with the Viterbi algorithm, which results in an improved and more robust recognition of characters at the sub-word level. Integrating the HMMs represents another step of the overall OCR trends being currently researched in the literature. The proposed approach exploits the structure of characters in the Arabic language in addition to their extracted features to achieve improved recognition rates. Useful statistical information of the Arabic language is initially extracted and then used to estimate the probabilistic parameters of the mathematical HMM. A new custom implementation of the HMM is developed in this study, where the transition matrix is built based on the collected large corpus, and the emission matrix is built based on the results obtained via the extracted character features. The recognition process is triggered using the Viterbi algorithm which employs the most probable sequence of sub-words. The model was implemented to recognize the sub-word unit of Arabic text raising the recognition rate from being linked to the worst recognition rate for any character to the overall structure of the Arabic language. Numerical results show that there is a potentially large recognition improvement by using the proposed algorithms.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Optical character recognition of handwritten Arabic using hidden Markov models</title>
<author>
<name sortKey="Aulama, Mohannad M" sort="Aulama, Mohannad M" uniqKey="Aulama M" first="Mohannad M." last="Aulama">Mohannad M. Aulama</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computer Engineering Department, University of Jordan</s1>
<s2>Amman 11942</s2>
<s3>JOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Jordanie</country>
<wicri:noRegion>Amman 11942</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Natsheh, Asem M" sort="Natsheh, Asem M" uniqKey="Natsheh A" first="Asem M." last="Natsheh">Asem M. Natsheh</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computer Engineering Department, University of Jordan</s1>
<s2>Amman 11942</s2>
<s3>JOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Jordanie</country>
<wicri:noRegion>Amman 11942</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Abandah, Gheith A" sort="Abandah, Gheith A" uniqKey="Abandah G" first="Gheith A." last="Abandah">Gheith A. Abandah</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computer Engineering Department, University of Jordan</s1>
<s2>Amman 11942</s2>
<s3>JOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Jordanie</country>
<wicri:noRegion>Amman 11942</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Olama, Mohammed M" sort="Olama, Mohammed M" uniqKey="Olama M" first="Mohammed M." last="Olama">Mohammed M. Olama</name>
<affiliation wicri:level="2">
<inist:fA14 i1="02">
<s1>CSED, Oak Ridge National Laboratory, PO Box 2008, MS-6085</s1>
<s2>Oak Ridge, TN 37831</s2>
<s3>USA</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Tennessee</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">11-0436660</idno>
<date when="2011">2011</date>
<idno type="stanalyst">PASCAL 11-0436660 INIST</idno>
<idno type="RBID">Pascal:11-0436660</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000117</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000656</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000105</idno>
<idno type="wicri:doubleKey">0277-786X:2011:Aulama M:optical:character:recognition</idno>
<idno type="wicri:Area/Main/Merge">000557</idno>
<idno type="wicri:Area/Main/Curation">000551</idno>
<idno type="wicri:Area/Main/Exploration">000551</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Optical character recognition of handwritten Arabic using hidden Markov models</title>
<author>
<name sortKey="Aulama, Mohannad M" sort="Aulama, Mohannad M" uniqKey="Aulama M" first="Mohannad M." last="Aulama">Mohannad M. Aulama</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computer Engineering Department, University of Jordan</s1>
<s2>Amman 11942</s2>
<s3>JOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Jordanie</country>
<wicri:noRegion>Amman 11942</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Natsheh, Asem M" sort="Natsheh, Asem M" uniqKey="Natsheh A" first="Asem M." last="Natsheh">Asem M. Natsheh</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computer Engineering Department, University of Jordan</s1>
<s2>Amman 11942</s2>
<s3>JOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Jordanie</country>
<wicri:noRegion>Amman 11942</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Abandah, Gheith A" sort="Abandah, Gheith A" uniqKey="Abandah G" first="Gheith A." last="Abandah">Gheith A. Abandah</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computer Engineering Department, University of Jordan</s1>
<s2>Amman 11942</s2>
<s3>JOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Jordanie</country>
<wicri:noRegion>Amman 11942</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Olama, Mohammed M" sort="Olama, Mohammed M" uniqKey="Olama M" first="Mohammed M." last="Olama">Mohammed M. Olama</name>
<affiliation wicri:level="2">
<inist:fA14 i1="02">
<s1>CSED, Oak Ridge National Laboratory, PO Box 2008, MS-6085</s1>
<s2>Oak Ridge, TN 37831</s2>
<s3>USA</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Tennessee</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2011">2011</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Pattern recognition</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Reconnaissance forme</term>
<term>Algorithme</term>
<term>0130C</term>
<term>4230S</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The problem of optical character recognition (OCR) of handwritten Arabic has not received a satisfactory solution yet. In this paper, an Arabic OCR algorithm is developed based on Hidden Markov Models (HMMs) combined with the Viterbi algorithm, which results in an improved and more robust recognition of characters at the sub-word level. Integrating the HMMs represents another step of the overall OCR trends being currently researched in the literature. The proposed approach exploits the structure of characters in the Arabic language in addition to their extracted features to achieve improved recognition rates. Useful statistical information of the Arabic language is initially extracted and then used to estimate the probabilistic parameters of the mathematical HMM. A new custom implementation of the HMM is developed in this study, where the transition matrix is built based on the collected large corpus, and the emission matrix is built based on the results obtained via the extracted character features. The recognition process is triggered using the Viterbi algorithm which employs the most probable sequence of sub-words. The model was implemented to recognize the sub-word unit of Arabic text raising the recognition rate from being linked to the worst recognition rate for any character to the overall structure of the Arabic language. Numerical results show that there is a potentially large recognition improvement by using the proposed algorithms.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Jordanie</li>
<li>États-Unis</li>
</country>
<region>
<li>Tennessee</li>
</region>
</list>
<tree>
<country name="Jordanie">
<noRegion>
<name sortKey="Aulama, Mohannad M" sort="Aulama, Mohannad M" uniqKey="Aulama M" first="Mohannad M." last="Aulama">Mohannad M. Aulama</name>
</noRegion>
<name sortKey="Abandah, Gheith A" sort="Abandah, Gheith A" uniqKey="Abandah G" first="Gheith A." last="Abandah">Gheith A. Abandah</name>
<name sortKey="Natsheh, Asem M" sort="Natsheh, Asem M" uniqKey="Natsheh A" first="Asem M." last="Natsheh">Asem M. Natsheh</name>
</country>
<country name="États-Unis">
<region name="Tennessee">
<name sortKey="Olama, Mohammed M" sort="Olama, Mohammed M" uniqKey="Olama M" first="Mohammed M." last="Olama">Mohammed M. Olama</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000551 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000551 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:11-0436660
   |texte=   Optical character recognition of handwritten Arabic using hidden Markov models
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024